{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "provenance": [],
      "include_colab_link": true
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    },
    "language_info": {
      "name": "python"
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "view-in-github",
        "colab_type": "text"
      },
      "source": [
        "<a href=\"https://colab.research.google.com/github/Maplemx/Agently/blob/main/playground/sql_generator.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "# Generate SQL according Meta Data of Database"
      ],
      "metadata": {
        "id": "dAzfqHDCAXZe"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Demo Description\n",
        "\n",
        "**Author:** Agently Team\n",
        "\n",
        "**Prompt Language:** English\n",
        "\n",
        "**Agent Components:** None\n",
        "\n",
        "**Description:**\n",
        "\n",
        "Answering questions from bosses or coworkers about business with unprepared data is too often a situation for a business analyst or a data scientist. Sometimes it takes too much time and chops your work time into too small chunks. But what more can you do? Bosses and coworkers cannot write SQL for themselves, so they are counting on you!\n",
        "\n",
        "But what if you have an agent to write SQL from natural language questions? Does that help you a lot? Take a look at this showcase and explore how Agently agent can help.\n",
        "\n",
        "作为一个商业分析师或者数据科学家，你是否经常遇到你的老板、同事在不停地问你一些数据报表中根本没有预处理的数据？作为一个分析研究的专家，你当然想把时间花在“分析”而不是“提数”上，而且总是处理“提数”这件事可能把你深度思考的时间切得粉碎。但你能怎么办呢？你的老板、同事又不会写SQL，他们只能找你这个“专家”。\n",
        "\n",
        "如果有一个Agent，能够帮你把自然语言问题转换成SQL，是不是就帮了大忙了？看看下面这个案例是如何使用Agently框架，写几行简单的代码解决这个问题的。"
      ],
      "metadata": {
        "id": "kyLFmv_l-aIx"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Step 1: Install Packages"
      ],
      "metadata": {
        "id": "nRsfMu4lAJZF"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "!pip install -U Agently"
      ],
      "metadata": {
        "id": "nsst7pOAANlr"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Step 2: Demo Code"
      ],
      "metadata": {
        "id": "_-1gryYwASPM"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "### Information of Data Tables\n",
        "\n",
        "We got a MySQL database with two tables like this:\n",
        "\n",
        "|||Table 1: user_info|||\n",
        "---|---|---|---|---\n",
        "column name | column desc | data type | data example | annotation\n",
        "user_id | user identity code | string | \"u_123456\" |\n",
        "gender | user gender | int | 0 | 0: female, 1: male\n",
        "age | user age | int | 18 |\n",
        "customer_level | user's customer level | int | 4 | 1: copper, 2: silver, 3: gold, 4: platinum, 5: vvip\n",
        "reg_time | when did user register | timestamp | 1701340135721 |\n",
        "reg_channel | which channel did user come from | int | 0 | 0: natural, 1: game ads, 2: video ads, 3: streaming live ads, 4: offline post ads\n",
        "\n",
        "|||Table 2: order |||\n",
        "---|---|---|---|---\n",
        "column name | column desc | data type | data example | annotation\n",
        "order_id | order identity code | string | \"o_456789\" |\n",
        "customer_user_id | user identity of who paid this order | string | \"u_234567\" |\n",
        "item_name | item's name | string | \"HD Vanilla Icecream\" |\n",
        "item_class_tags | tags of item | JSON String | [\"food\", \"snack\", \"frozen\"] | standard JSON string format\n",
        "price | price of 1 item | float | 1.35 | unit: $\n",
        "order_item_number | how many items in this order | int | 3 |\n",
        "order_time | when was this order created | timestamp | 1701341789164 |\n",
        "pay_time | when was this order paid | timestamp | 1701341835323 | value should be 0 if not paid\n",
        "\n",
        "As you can see, those two tables are very common in E-commerce system."
      ],
      "metadata": {
        "id": "w_D2P1MhFTy0"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "### DEMO: Generate SQL"
      ],
      "metadata": {
        "id": "0UEuz1TLFBTc"
      }
    },
    {
      "cell_type": "code",
      "execution_count": 12,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "lLaK-w-E-ZKU",
        "outputId": "148d1385-41dc-4186-d5ad-dea1f168926c"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "Ask questions about data using natural language.\n",
            "Input '#exit' if you want to stop.\n",
            "------\n",
            "\n",
            "[Question]: What's the user revenue of those who came because ads in games last month?\n",
            "[Thinking Process]: \n",
            "We need to retrieve the revenue of users who came because ads in games last month.\n",
            "To do this, we need to join the 'user_info' and 'order' tables on the user identity code.\n",
            "We also need to filter the orders based on the 'reg_channel' column from the 'user_info' table, where the value is 1 (game ads) and the 'pay_time' column from the 'order' table is within the last month.\n",
            "We can calculate the revenue by multiplying the 'price' column with the 'order_item_number' column in the 'order' table.\n",
            "Finally, we need to group the revenue by user and sum it to get the total revenue of those users.\n",
            "[SQL]: \n",
            " SELECT ui.user_id, SUM(o.price * o.order_item_number) AS revenue FROM user_info ui JOIN order o ON ui.user_id = o.customer_user_id WHERE ui.reg_channel = 1 AND o.pay_time >= DATE_SUB(CURDATE(), INTERVAL 1 MONTH) GROUP BY ui.user_id\n",
            "------\n",
            "\n",
            "[Question]: How many vvip bought frozen snack this year?\n",
            "[Thinking Process]: \n",
            "I need to find out the user IDs who bought frozen snack this year\n",
            "Then, I can count the number of VVIP users among them\n",
            "[SQL]: \n",
            " SELECT COUNT(*) AS vvip_buyers FROM user_info WHERE user_id IN (SELECT DISTINCT customer_user_id FROM orders WHERE YEAR(FROM_UNIXTIME(pay_time/1000)) = YEAR(CURRENT_DATE()) AND JSON_CONTAINS(item_class_tags, 'frozen')) AND customer_level = 5\n",
            "------\n",
            "\n",
            "[Question]: #exit\n"
          ]
        }
      ],
      "source": [
        "import Agently\n",
        "from datetime import datetime\n",
        "\n",
        "agent_factory\\\n",
        "    .set_settings(\"model.OpenAI.auth\", { \"api_key\": \"\" })\n",
        "\n",
        "# Create a meta data dict of those two tables:\n",
        "## Usually we can get the meta data from our system by requesting APIs\n",
        "## instead of typing them down below manually\n",
        "meta_data = [\n",
        "    {\n",
        "        \"table_name\": \"user_info\",\n",
        "        \"columns\": [\n",
        "            { \"name\": \"user_id\", \"desc\": \"user identity code\", \"data type\": \"string\", \"example\": \"u_123456\" },\n",
        "            { \"name\": \"gender\", \"desc\": \"user gender\", \"data type\": \"int\", \"example\": 0, \"annotation\": \"0: female, 1: male\" },\n",
        "            { \"name\": \"age\", \"desc\": \"user age\", \"data type\": \"int\", \"example\": 18 },\n",
        "            { \"name\": \"customer_level\", \"desc\": \"user's customer level\", \"data type\": \"int\", \"example\": 4, \"annotation\": \"1: copper, 2: silver, 3: gold, 4: platinum, 5: vvip\" },\n",
        "            { \"name\": \"reg_time\", \"desc\": \"when did user register\", \"data type\": \"timestamp\", \"example\": 1701340135721 },\n",
        "            { \"name\": \"reg_channel\", \"desc\": \"which channel did user come from\", \"data type\": \"int\", \"example\": 0, \"annotation\": \"0: natural, 1: game ads, 2: video ads, 3: streaming live ads, 4: offline post ads\" },\n",
        "        ],\n",
        "    },\n",
        "    {\n",
        "        \"table_name\": \"order\",\n",
        "        \"columns\": [\n",
        "            { \"name\": \"order_id\", \"desc\": \"order identity code\", \"data type\": \"string\", \"example\": \"o_456789\" },\n",
        "            { \"name\": \"customer_user_id\", \"desc\": \"user identity of who paid this order\", \"data type\": \"string\", \"example\": \"u_234567\" },\n",
        "            { \"name\": \"item_name\", \"desc\": \"item's name\", \"data type\": \"string\", \"example\": \"HD Vanilla Icecream\" },\n",
        "            { \"name\": \"item_class_tags\", \"desc\": \"tags of item\", \"data type\": \"JSON String\", \"example\": \"[\\\"food\\\", \\\"snack\\\", \\\"frozen\\\"]\", \"annotation\": \"standard JSON string format\" },\n",
        "            { \"name\": \"price\", \"desc\": \"price of 1 item\", \"data type\": \"float\", \"example\": 1.35, \"annotation\": \"unit: $\" },\n",
        "            { \"name\": \"order_item_number\", \"desc\": \"how many items in this order\", \"data type\": \"int\", \"example\": 3 },\n",
        "            { \"name\": \"order_time\", \"desc\": \"when was this order created\", \"data type\": \"timestamp\", \"example\": 1701341789164 },\n",
        "            { \"name\": \"pay_time\", \"desc\": \"when was this order paid\", \"data type\": \"timestamp\", \"example\": 1701341835323 },\n",
        "        ],\n",
        "    }\n",
        "]\n",
        "## As you can see, this `meta_data` dict has a complex structure\n",
        "## But it's OK, because Agently framework can help you handle the complex structure\n",
        "## You can pass the dict to .info() or .input() like a variable directly\n",
        "## instead of trying to figure out how to prompt or how to make model understand\n",
        "\n",
        "agent = agent_factory.create_agent()\n",
        "print(\"Ask questions about data using natural language.\\nInput '#exit' if you want to stop.\\n------\\n\")\n",
        "while True:\n",
        "    user_input = input(\"[Question]: \")\n",
        "    if user_input == \"#exit\":\n",
        "        break\n",
        "    result = agent\\\n",
        "        .input(user_input)\\\n",
        "        .info({ \"database meta data\": meta_data })\\\n",
        "        .info({ \"current date\": datetime.now().date() })\\\n",
        "        .instruct(\"Generate SQL for MySQL database according {database meta data} to answer {input}\")\\\n",
        "        .output({\n",
        "            \"thinkings\": [(\"String\", \"Your thinkings step by step about how to query data to answer {input}\")],\n",
        "            \"SQL\": (\"String\", \"SQL String without explanation\"),\n",
        "        })\\\n",
        "        .start()\n",
        "    print(\"[Thinking Process]: \")\n",
        "    for thinking in result[\"thinkings\"]:\n",
        "        print(thinking)\n",
        "    print(\"[SQL]: \\n\", result[\"SQL\"])\n",
        "    print(\"------\\n\")"
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "\n",
        "\n",
        "---\n",
        "\n",
        "[**_<font color = \"red\">Agent</font><font color = \"blue\">ly</font>_** Framework - Speed up your AI Agent Native application development](https://github.com/Maplemx/Agently)"
      ],
      "metadata": {
        "id": "b8QQSLDyMMN9"
      }
    }
  ]
}